-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend] Automatic detection of chat content format from AST #9919
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Great idea with the PR @DarkLight1337 ! |
Right now I am thinking of using Jinja's AST parser and working off that. The basic idea is to detect whether |
vllm/entrypoints/chat_utils.py
Outdated
def _is_var_access(node: jinja2.nodes.Node, varname: str) -> bool: | ||
if isinstance(node, jinja2.nodes.Name): | ||
return node.ctx == "load" and node.name == varname | ||
|
||
return False | ||
|
||
|
||
def _is_attr_access(node: jinja2.nodes.Node, varname: str, key: str) -> bool: | ||
if isinstance(node, jinja2.nodes.Getitem): | ||
return (node.ctx == "load" and _is_var_access(node.node, varname) | ||
and isinstance(node.arg, jinja2.nodes.Const) | ||
and node.arg.value == key) | ||
|
||
if isinstance(node, jinja2.nodes.Getattr): | ||
return (node.ctx == "load" and _is_var_access(node.node, varname) | ||
and node.attr == key) | ||
|
||
return False | ||
|
||
|
||
def _iter_nodes_define_message(chat_template_ast: jinja2.nodes.Template): | ||
# Search for {%- for message in messages -%} loops | ||
for loop_ast in chat_template_ast.find_all(jinja2.nodes.For): | ||
loop_iter = loop_ast.iter | ||
loop_target = loop_ast.target | ||
|
||
if _is_var_access(loop_iter, "messages"): | ||
assert isinstance(loop_target, jinja2.nodes.Name) | ||
yield loop_ast, loop_target.name | ||
|
||
|
||
def _iter_nodes_define_content_item(chat_template_ast: jinja2.nodes.Template): | ||
for node, message_varname in _iter_nodes_define_message(chat_template_ast): | ||
# Search for {%- for content in message['content'] -%} loops | ||
for loop_ast in node.find_all(jinja2.nodes.For): | ||
loop_iter = loop_ast.iter | ||
loop_target = loop_ast.target | ||
|
||
if _is_attr_access(loop_iter, message_varname, "content"): | ||
assert isinstance(loop_target, jinja2.nodes.Name) | ||
yield loop_iter, loop_target.name | ||
|
||
|
||
def _detect_content_format( | ||
chat_template: str, | ||
*, | ||
default: _ChatTemplateContentFormat, | ||
) -> _ChatTemplateContentFormat: | ||
try: | ||
jinja_compiled = hf_chat_utils._compile_jinja_template(chat_template) | ||
jinja_ast = jinja_compiled.environment.parse(chat_template) | ||
except Exception: | ||
logger.exception("Error when compiling Jinja template") | ||
return default | ||
|
||
try: | ||
next(_iter_nodes_define_content_item(jinja_ast)) | ||
except StopIteration: | ||
return "string" | ||
else: | ||
return "openai" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This handles the most common case of iterating through OpenAI-formatted message['content']
as a list, assuming that no relevant variable reassignments are made other than those in the for loops.
Please tell me if you are aware of any chat templates that don't work with this code.
@@ -380,10 +521,7 @@ def load_chat_template( | |||
|
|||
# If opening a file fails, set chat template to be args to | |||
# ensure we decode so our escape are interpreted correctly | |||
resolved_chat_template = codecs.decode(chat_template, "unicode_escape") | |||
|
|||
logger.info("Using supplied chat template:\n%s", resolved_chat_template) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thie logging line has been moved to vllm/entrypoints/openai/api_server.py
.
chat_template: Optional[str] = Field( | ||
default=None, | ||
description=( | ||
"A Jinja template to use for this conversion. " | ||
"As of transformers v4.44, default chat template is no longer " | ||
"allowed, so you must provide a chat template if the tokenizer " | ||
"does not define one."), | ||
) | ||
chat_template_kwargs: Optional[Dict[str, Any]] = Field( | ||
default=None, | ||
description=("Additional kwargs to pass to the template renderer. " | ||
"Will be accessible by the chat template."), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These arguments are present in other chat-based APIs so I added them here as well.
--chat-template-text-format
8ce013b
to
e262745
Compare
This pull request has merge conflicts that must be resolved before it can be |
e262745
to
c37af03
Compare
Signed-off-by: DarkLight1337 <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: DarkLight1337 <[email protected]>
@maxdebayser does this look good to you now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DarkLight1337 , I've left a few comments, I think the one about the assignment search is worth of your consideration but other than that it looks good to me.
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: DarkLight1337 <[email protected]>
@DarkLight1337 looks like there's one test failure remaining |
The network is quite slow right now (HF keeps timing out for a lot of other PRs). This error comes from not being able to download the video before timeout occurs. (It passes when I run it locally.) Can you approve this PR? Then I'll retry the CI once the network returns to normal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @DarkLight1337 @maxdebayser!
…project#9919) Signed-off-by: DarkLight1337 <[email protected]>
…project#9919) Signed-off-by: DarkLight1337 <[email protected]>
…project#9919) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]>
…project#9919) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: rickyx <[email protected]>
…project#9919) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]>
…project#9919) Signed-off-by: DarkLight1337 <[email protected]>
…project#9919) Signed-off-by: DarkLight1337 <[email protected]>
This PR renames
--chat-template-text-format
(introduced by #9358) to--chat-template-content-format
and moves it to the CLI parser specific to OpenAI-compatible server. Also, it removes the redundant hardcoded logic for Llama-3.2-Vision (last updated by #9393) since we can now run online inference with--chat-template-content-format openai
.To avoid causing incompatibilities with how users are currently serving Llama-3.2-Vision, I have added code to automatically detect the format to use based on the AST of the provided chat template.
cc @vrdn-23 @ywang96 @heheda12345 @alex-jw-brooks
FIX #10286